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Tuberculosis (TB) is a disease that causes death if not treated early. 
Ensemble deep learning can aid early TB detection. Previous work trained 
the ensemble classifiers on images with similar features only. An ensemble 
requires a diversity of errors to perform well, which is achieved using either 
different classification techniques or feature sets. This paper focuses on the 
latter, where TB detection using deep learning and contrast-enhanced canny 
edge detected (CEED-Canny) x-ray images is presented. The CEED-Canny 
was utilized to produce edge detected images of the lung x-ray. Two types of 
features were generated; the first was extracted from the Enhanced x-ray 
images, while the second from the Edge detected images. The proposed 
variation of features increased the diversity of errors of the base classifiers 
and improved the TB detection. The proposed ensemble method produced a 
comparable accuracy of 93.59%, sensitivity of 92.31% and specificity of 
94.87% with previous work. 
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1. INTRODUCTION 

The bacteria that causes Tuberculosis is called Mycobacterium Tuberculosis. The World Health 
Organization states that about ten million individuals were infected with TB in 2018, with approximately 
25,387 cases reported in Malaysia [1]. Among them, about 1.5 million died from TB in 2018. 

TB is a treatable infection, and it can be detected by examining the chest x-rays of the patients. 
Therefore, early diagnosis of TB is essential for raising the likeliness of recovery [2]. Nevertheless, there are 
two main challenges for TB detection. First, lung cancer and TB look similar, this causes difficulties for a 
radiologist to differentiate between these two [3]. Second, there is an insufficiency of expert radiograph 
readers in high-TB-burden areas [4]. Therefore, a semi-automated TB detection system that can support 
medical diagnosis is necessary to provide better healthcare to society [5]. 

Several works on semi-automated TB detection using machine learning can be found in the 
literature. Lately, features from medical images were analyzed using deep learning techniques [6]. 
Deep learning can identify features hierarchically. The lower level features help generates the higher-level 
features. The ability of deep learning to identify high-level features is shown to produce better classification 
results [7]. Numerous works using deep learning for TB detection on chest x-rays can be found in [8-19]. 
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Some studies used ensemble techniques, whereby more than one classifier was selected, to make 
predictions. The ensemble classifiers can perform more accurate classification than any single classifier [20]. 
More work on TB detection using ensemble techniques were presented in [21-23]. 

According to the literature, it is shown that deep learning performed well in TB detection. The 
works presented in the literature mostly utilized features extracted from the original chest x-ray 
images [11-12, 14-19, 22]. The performance of the classifier is influenced by the features used. Therefore, 
using a variety of features to train the classifiers may generate better TB detection models and this should be 
investigated. An ensemble of classifiers perform well when the base classifiers have a diversity of errors [24], 
which means the errors of the base classifiers have a low correlation. One method to achieve such a condition 
is by having diverse classifiers. If two classifiers make different errors on new values, they are said to be 
diverse. For example, if there is an ensemble of three classifiers, namely f,, f2 and f3, and a new image x. If 
these classifiers are not diverse, then when f\(x) misclassify, f2(x) and f3(x) will also misclassify. Conversely, 
if the errors made by the classifiers are uncorrelated, then when f1(x) misclassify, f2(x) and f3(x) might be 
correct. Diverse classifiers could be achieved either by using different classification techniques or different 
sets of features. Most of the works on ensemble classifiers in the literature focus on the former. 

The process of identifying the boundaries of objects in an image is known as edge detection. The 
discontinuities in brightness were detected to define edges. Edge detected images provide essential features 
for image analysis [25-26]. There are many edge detection methods reported in the literature. Traditional 
gradient-based operators like Sobel, Robert and Prewitt were first constructed for edge detection, but they are 
susceptible to noise and did not produce sharp edges [27]. Laplacian operators’, another method to detect 
edges, tend to detect false edges and severe localization errors of curved edges [25]. John F. Canny proposed 
an algorithm known as canny edge detection in 1986, and his technique is considered as the optimum edge 
detection technique for images that are degraded by noise [28]. However, the conventional canny edge 
detection technique suffers from a few weaknesses. One of them is that the use of Gaussian filtering 
smoothed the important edges while removing the noise [29]. As a result, the edges are weakened. 

This paper presents an approach to TB detection using deep learning and contrast-enhanced canny 
edge detected (CEED-Canny) x-ray images. CEED-Canny is a modified Canny algorithm to detect the edges 
of the x-ray images and was used to generate the edge detected images. Previous work extracted features 
from the original chest x-ray images only. Our method extracted features from the enhanced original chest 
x-ray images and the edge detected images. The aim is to increase the diversity of errors in the base 
classifiers of the ensemble classifiers. We hypothesized that by using different types of features and ensemble 
classifiers will produce higher TB detection accuracy, sensitivity and specificity. The contributions of this 
paper are thus: 

a. Using the CEED-Canny technique that combines canny edge detection and local morphological contrast 
enhancement to detect edges in chest x-ray images. 

b. Generate ensemble CNN classifiers for TB detection using two sets of images, namely the Enhanced 
images of chest x-rays and the chest Edges images detected from the Enhanced x-rays. 


2. RESEARCH METHOD 

There are three main phases in our research work: 
a. Image preparation, 
b. Classifiers generation, 
c. Ensemble classification. 
In the image preparation phase, resizing, contrast limited adaptive histogram equalization (CLAHE) and 
CEED-Canny were performed and this produced two sets of images - the enhanced images and the edge 
images. These images were used for training. In the classifiers generation phase, several selected CNN 
architectures were used to generate classifiers. Each classifier extracted features from the Enhanced and Edge 
images and learned to recognize TB-infected lungs and healthy lungs. In the ensemble phase, the predictions 
made by individual classifiers were combined using average probability scoring. Details of each phase are 
described in sub-sections 2.1 to 2.3. Sub-section 2.4 presents the performance measure. 


2.1. Image preparation phase 

There are three operations involved in the image preparation phase. First, resizing was performed on 
all the images such that they are all 250 x 250 pixels. The reason for doing this was to reduce computational 
workload, as the computational cost increases with image size. It was also to make sure all images are of the 
same size and match the CNNs’ input size. Second, CLAHE was performed on the original x-ray images to 
produce the Enhanced dataset. This operation improves the quality of the x-ray images. Third, CEED-Canny, 
described in detail in section 2.1.1, was applied to the Enhanced dataset to generate the Edge dataset. 


Int J Artif Intell, Vol. 9, No. 4, December 2020: 713 — 720 


Int J Artif Intell ISSN: 2252-8938 i) 715 


The Edge dataset was used as we conjectured that images of TB infected lungs might contain more 
uncommon edges compared to healthy lungs. At the end of this phase, two sets of images were generated, 
namely the Edge images and Enhanced images. Both sets of images were used to generate classifiers. 


2.1.1. Contrast-enhanced canny edge detection for chest X-Ray images 

The CEED-Canny method combines local morphological contrast enhancement and the Canny edge 
detection technique. Morphology is an image processing technique that processes images based on shapes 
known as structuring elements. In morphological operations, a structuring element is applied to an input 
image, and then an output image with the same size as the input image is produced. The intensity value of 
each pixel in the output image is decided by a comparison of the corresponding pixel in the input image with 
its neighbors. For morphological contrast enhancement, its filter replaces the central pixel by the local 
maximum if the original pixel value is closest to the local maximum; otherwise, minimum local will be used. 
The Canny edge detector consists of a few steps. The first step is noise reduction. Noise contained in the 
image is reduced by convolving the input image with the Gaussian filter. The second step is finding gradients 
to detect edges where the change in grayscale intensity is maximum. The third step is non-maximum 
suppression that retain all local maxima in the gradient image and eliminates any undesirable pixels that may 
not be a part of an edge. The final step is the hysteresis thresholding. This stage determines which 
detected edges are true edges and which are false. More information about Canny edge detection can be 
found in [29-30]. 

Figure 1 shows the pseudocode of the CEED-Canny. For a grayscale image, morphological contrast 
enhancement was first performed, followed by Canny edge detection. Morphological contrast enhancement 
replaces the central pixel by the local maximum if the original pixel value is closest to the local maximum, 
otherwise by the local minimum. Then, Canny edge detection was applied to detect the edges present in the 
image. Figure 2 displays samples of a chest x-ray image before and after the application of CEED-Canny. 


Input 
- Load grayscale image 
Process 
- FOR all pixels 
- Get original pixel value, P. 
- Find local maximum, Pax 
- Find local minimum, Pais 
- Compute difference between original pixel value and local maximum, D 
- Compute difference between original pixel value and local minimum, D>, 


IF D; < Dp 

Po = Prax 
ELSE 

Po = Pain 


Perform noise reduction using Gaussian filter 
Compute Intensity Gradient of the image 
Perform non-maximum Suppression 

Set double threshold value 

Apply hysteresis thresholding 


prrngs 


Edge image 


Figure 1. Pseudocode for CEED-Canny 


Figure 2. Chest x-ray image (left) before CEED-Canny is applied and (right) after CEED-Canny is applied 


2.2. Classifiers generation phase 
The process of generating a classifier using CNN is as follows. First, the image dataset was divided 
into two subsets, namely the training set and testing set. Next, CNN was used to extract features from the 
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training set images and learn to recognize Normal and TB lungs. After the training was completed, 
a classifier was produced. This classifier was used to make predictions on every image in the test set, and the 
results were recorded. The performance of the classifier was evaluated using a confusion matrix, which was 
used to determine the sensitivity, specificity and accuracy of the classifier. 


2.3. Ensemble phase 

When a classifier predicts the label of an image, it outputs a probability score in the range of 0 to 1. 
If the score is lower than 0.5, then the image was labeled as Normal. Else, it was labeled as TB. For ensemble 
classification, average probability scoring was used. The probability scores of individual CNN classifiers 
were averaged, and the subsequent average probability score defines the final label of the images [10]. 
The following formula was used to determine the ensemble’s final score: 


vx 
final score = — 
n 


where x is the probability score of a classifier and n is the number of the classifier. 


2.4. Performance measure 

The presented work requires two types of performance measures. The first is to evaluate the 
effectiveness of the CEED-Canny in detecting the chest edges from a chest x-ray image. The second is to 
measure the performance of TB detection. 

The effectiveness of the CEED-Canny was evaluated using the mean square error (MSE). MSE was 
used because it provides an impartial assessment of the perceptual quality of images [31]. Most previous 
work used MSE to evaluate their proposed image processing technique [29, 32-33]. The MSE is the average 
of the squares of the errors between the original image and the noisy image. The error is the amount by which 
the values of the original image differ from the noisy image. To calculate MSE, the errors between the 
original image and the noisy image are squared, then averaged. The lesser the MSE, the better the restoration 
of the noisy image to match the original image, thus demonstrating the performance of the restorative 
algorithm. MSE is defined as follow: 


1 m-1n-1 
MSE = —— y YEG — gis? 


where f denotes the matrix data of the original image, g denotes the matrix data of the degraded image, 
m denotes the numbers of rows of pixels of the images, i denotes the index of that row, n denotes the number 
of columns of pixels of the image and j denotes the index of that column. 

The metrics selected to evaluate the performance of this approach were sensitivity, specificity and 
accuracy. Sensitivity calculates how well the classifier identifies positive cases. On the other hand, specificity 
calculates how well the classifier identifies negative cases. Accuracy measures how well the classifier 
predicts both labels. In this paper, TB was treated as a positive case and non-TB was treated as a negative 
case. 


3. RESULTS AND ANALYSIS 
This section describes the datasets used, the experimental setups, results and the discussions of 
results. 


3.1. The datasets 

Two public TB chest x-ray image datasets were selected in this paper. They are the Montgomery 
and Shenzhen datasets [34]. The Montgomery dataset contains 138 images, in which 58 images were 
TB-infected lungs and 80 images were healthy lungs. The image resolution was either 4020x4892 pixels or 
4892x4020 pixels. The Shenzhen dataset contains 662 images, in which 336 images were TB-infected lungs 
and 326 images were healthy lungs. The image resolutions were approximately 3000x3000 pixels. Both 
datasets provide images in PNG format. Both datasets were combined, which provides a total of 394 TB- 
infected lungs and 406 healthy lungs. Twelve healthy lung images were randomly discarded, so that the 
number of images in each category is the same. For training, 90% of the images were allocated. 
The remaining 10% was allocated for testing. This train-test split proportion is identical to the works 
done in [18] and [35]. 
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3.2. Experimental setup 

Three experiments were conducted to measure the performance of the proposed approach. The first 
experiment aimed at the performance evaluation of the proposed CEED-Canny method, while the second 
experiment was to evaluate the performance of TB detection using the features extracted from the Enhanced 
images and Edge detected images. These features were used to train CNN classifiers, and later, these 
classifiers made predictions on the test images. In this paper, two CNN architectures were used, the VGG16 
[36] and InceptionV3 [37]. VGG16 will be trained on the Enhanced dataset and Edge dataset, producing two 
classifiers. Likewise, InceptionV3 will also be trained on these two datasets and produced another two 
classifiers. The third experiment aimed at the performance comparison of individual classifiers and the 
ensemble of classifiers. Here, the Keras [38] implementation of VGG16 and InceptionV3 was used. These 
CNNs were also pre-trained using the ImageNet dataset. 


3.3 Performance of contrast-enhanced canny to detect edges 

In this set of experiments, we compare the performance of the proposed CEED-Canny and the 
original Canny. Four sample images were selected for the tests. The first two images were Lena and 
Cameraman. These two images are standard test images in the image processing community. The third and 
fourth images were taken from the Shenzen dataset. Table 1 shows the result of the MSE test. Based on the 
MSE test results, it shows that the CEED-Canny produced lower MSE values than those of the Canny. 
The results indicate that the edge image produced by CEED-Canny is more accurate than the original Canny. 


Table 1. Result of MSE Test for Canny and CEED-Canny methods 


Sample Canny MSE CEED-Canny MSE 
1 239.88 232.51 
2 214.32 208.35 
3 216.49 197.33 
4 244.82 230.98 


3.4. Individual classifier test results 

Two sets of images were used; the enhanced images and the edge images. All the test images were 
classified into either TB or non-TB in this experiment. The learning rate of the classifiers was set to 0.0001 
and the number of epochs was 2000. Preliminary tests were conducted to determine the optimum learning 
rate and number of epochs, which were not reported in this paper due to space constraints. Table 2 displays 
the experimental results. Based on Table 2, the highest accuracy and specificity of 91.03% and 92.31% were 
achieved using VGG16 and the Enhanced images. VGG16 records the best sensitivity of 89.74% on the 
Enhanced and Edge images; InceptionV3 produced a similar result when applied on the Edge images. 


Table 2. Performance of different CNN classifiers to detect TB on enhanced and edge images 


Image type CNN Classifier Sensitivity (%) Specificity (%) Accuracy (%) 
Enhanced VGG16 89.74 92.31 91.03 
Inception V3 84.62 79.49 82.05 
Edge detected VGG16 89.74 89.74 89.74 
Inception V3 89.74 74.36 82.05 


3.5. Ensemble classifier test results 

In the third experiment, classifications were done using ensembles of the classifiers. All available 
combinations were tested. Table 3 displays the results obtained from the test data. (A represents VGG16 
trained on the Enhanced dataset, B represents VGG16 trained on the Edge dataset, C represents Inception V3 
trained on the Enhanced images, D represents Inception V3 trained on the Edge images). 

Based on Table 3, all the ensemble combination produces at least the same, if not better, accuracy 
than any individual Inception V3 classifier. The ensemble combination that produced the highest accuracy is 
ABC, at 93.59%, with sensitivity and specificity of 92.31% and 94.87%, respectively. There were two 
combinations of the ensemble that produced the same highest sensitivity, at 92.31, which are ABC and 
ABCD. Sensitivity is considered pertinent in medical image analysis as we want to reduce false negatives. 
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Table 3. The Performance of Ensemble CNN to Detect TB 


Ensemble Sensitivity (%) Specificity (%) Accuracy (%) 
AB 89.74 92.31 91.03 
AC 79.49 89.74 84.62 
AD 87.18 84.62 85.90 
BC 84.62 79.49 82.05 
BD 87.18 84.62 85.90 
CD 89.74 76.92 83.33 

ABC 92.31 94.87 93.59 
ABD 89.74 92.31 91.03 
ACD 92.31 89.74 91.03 
BCD 87.18 87.18 87.18 
ABCD 89.74 89.74 89.74 


3.6. Discussion of results 

From the MSE Test, it is shown that the MSE values produced by CEED-Canny are lower than the 
MSE values produced by Canny. It indicates that the edge images produced by CEED-Canny are more 
accurate than the traditional Canny and subsequently produced more informative features for classifier 
generation. The individual classifier test results show the potential of using the edge features detected by the 
CEED-Canny in detecting TB. From the Ensemble Test, we have shown that the combined predictions of 
multiple classifiers, generated from different datasets, outperformed the predictions of single classifiers. The 
results show that higher sensitivity, specificity and accuracy can be achieved when using more than one type 
of image. Extracting features from different types of images can produce better TB detection performance. 
The ensemble of ABC produced the best performance with the accuracy, sensitivity and specificity of 
93.59%, 92.31% and 94.87%, respectively. 

There is no direct comparison that can be conducted with other works in the literature. For example, 
the TB detection on the Shenzhen and Montgomery datasets was performed separately in [9], whereby the 
accuracies recorded for each dataset were 95.57% and 78.3%, respectively. In [11], the sensitivity of 97.3% 
was recorded but with an additional dataset used and excessive augmentation, including random cropping of 
pixels, mean subtraction, mirror images, rotations and CLAHE. The other works [10, 16, 22] produced 
accuracies lower than that of the work presented in this paper. 


4. CONCLUSION 

This paper presents Tuberculosis detection using deep learning and Contrast-Enhanced Canny edge 
detected x-ray images. The problem is that previous ensembles only combine CNNs trained on similar 
features and thus limited the performance of the classifiers. We present ensembles that combine CNNs 
trained on different features extracted from a different set of images, the Enhanced and Edge images. We 
used CEED-Canny to produce edge images. VGG16 and InceptionV3 were selected and employed as 
classifiers. The results indicate that using ensembles of classifiers trained on multiple types of features 
extracted from various types of images improved the detection accuracy, sensitivity and specificity. 
Consequently, it supports the hypothesis of the work presented in this paper. For future works, we would like 
to extend the scope to classify chest x-ray images based on TB severity and investigate more features that 
would further improve the performance of the classifiers. 
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